Saturation and rounding in multiply-accumulate blocks

ABSTRACT

Saturation and rounding capabilities are implemented in MAC blocks to provide rounded and saturated outputs of multipliers and of add-subtract-accumulate circuits implemented using DSP. These features support any suitable format of value representation, including the x.15 format. Circuitry within the multipliers and the add-subtract-accumulate circuits implement the rounding and saturation features of the present invention.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 10/783,829, filed on Feb. 20, 2004, which is herebyincorporated by reference herein in its entirety.

BACKGROUND OF THE INVENTION

The present invention relates to programmable logic resources and moreparticularly, the present invention relates to programmable logicresources having digital signal processing (DSP) circuitry in whichsaturation and rounding is supported.

A programmable logic resource is a general-purpose integrated circuitthat is programmable to perform any of a wide range of logic tasks.Known examples of programmable logic resource technology includeprogrammable logic devices (PLDs), complex programmable logic devices(CPLDs), erasable programmable logic devices (EPLDs), electricallyerasable programmable logic devices (EEPLDs), and field programmablegate arrays (FPGAs).

Manufacturers of programmable logic resources, such as Altera®Corporation of San Jose, Calif., have recently begun manufacturingprogrammable logic resources that, in addition to programmable logiccircuitry, also include hardware DSP circuitry in the form ofmultiply-accumulate (MAC) blocks. The MAC blocks of programmable logicresources provide a way in which certain functionality of a user'sdesign may be implemented using less space on the programmable logicresource, thus resulting in a faster execution time because of thenature of DSP circuitry relative to programmable logic circuitry. MACblocks may be used in the processing of many different types ofapplications, including graphics applications, networking applications,communications applications, as well as many other types ofapplications.

MAC blocks are made of a number of multipliers, accumulators, andadders. The accumulators can perform add, subtract, or accumulateoperations. Typically, there are four multipliers, two accumulators, andan adder in a MAC block. The MAC block can have a plurality of modeswhich may be selectable to provide different modes of operation.

MAC blocks are used to implement components of a user design that areappropriate for implementation in DSP that would otherwise require theuse of a relatively large amount of programmable logic circuitry of theprogrammable logic resource. This allows the limited programmable logiccircuitry of the programmable logic resource to be used for implementingmore user design components than would otherwise be possible.

Typically, rounding and saturation circuitry for use with MAC blocks isimplemented using the programmable logic circuitry of a programmablelogic resource. This results in less programmable logic circuitryavailable for other components of a user design to be implemented withina particular programmable logic resource.

It would therefore be desirable to provide a programmable logic resourcethat makes more efficient use of its programmable logic circuitry andDSP circuitry.

SUMMARY OF THE INVENTION

It therefore an object of the present invention to provide aprogrammable logic resource that makes more efficient use of itsprogrammable logic circuitry and DSP circuitry.

A MAC block is provided in which rounding and saturation capabilitiesare made available by using DSP resources within the MAC block. Roundingand saturation of multiplier outputs and of add-subtract-accumulatecircuit (e.g., accumulator) outputs is provided by implementing withineach of the respective components appropriate shifting circuitries,arithmetic circuitries, zeroing circuitries, truncation circuitries,data analysis circuitries, and/or any other suitable components inaccordance with the present invention.

For example, in a multiplier where a 1.15 product output is desired,multiplication circuitry is used to generate an output that isleft-shifted and added to a predetermined value in order arrange bitsappropriately to allow the 16 MSB used to be obtained. The 16 MSB arepreferably used for the 1.15 format rounded output.

Saturation is provided whereby the inputs to the multiplier are checkedfor overflow (i.e., when both inputs are −1 if in 1.15 format). If anoverflow condition exists, then the saturation circuitry provides apredetermined saturated value as an output.

With respect to add-subtract-accumulate circuits, the present inventionprovides rounding capability within the respective components byproviding appropriate circuitry for preparing a desired number oftopmost bits of the output signal. For example, in an accumulator thatnormally outputs an 18.31 format value, a predetermined value may beadded to coordinate the bits such that after zeroing (or truncating) the16 LSB, an effective rounded 18.15 format output results.

Saturation is provided by circuitry that tests for the presence of anoverflow or underflow condition (e.g., where the output of theadd-subtract-accumulate circuit is greater than or equal to 1, or lessthan −1). If an overflow or underflow condition exists, then thesaturation circuitry outputs an appropriate saturation value.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects of the present invention will be apparentupon consideration of the following detailed description, taken inconjunction with the accompanying drawings, in which like referencecharacters refer to like parts throughout, and in which:

FIG. 1 is a block diagram of an illustrative MAC block in which four nbit by n bit multipliers are implemented as four n bit by n bitmultipliers;

FIG. 2 is a block diagram of an illustrative MAC block in which four nbit by n bit multipliers are implemented as eight n/2 bit by n/2 bitmultipliers;

FIG. 3-5 are schematic diagrams of an illustrative multiplier havingrounding and saturation capabilities in accordance with the presentinvention;

FIG. 6 is a schematic diagram of an illustrative add-subtract-accumulatecircuit in accordance with the present invention;

FIG. 7 is a block diagram of an illustrative programmable logic resourcehaving at least one MAC block in accordance with the present invention;and

FIG. 8 s a block diagram of an illustrative system employing aprogrammable logic resource in accordance with the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In accordance with the invention a multiplier-accumulator (MAC) block isprovided in which multipliers and certain adders/subtracters (e.g.,those used to implement accumulators) have rounding and saturatingcapabilities.

In FIG. 1, a vertically-arranged four multiplier-based organization of aMAC block is shown. Four multiplier circuits 136 may be stackedvertically to potentially operate in parallel. Each multiplier circuit136 may include an n bits by n bits multiplier (e.g., 18 bit by 18 bitmultiplier) to provide an n bits by n bits multiplication product. Theinputs of each multiplier circuit may be fed up to n bits of informationfor the multiplicand and for the multiplier for the multiplieroperation. Each multiplier circuit 136 may have an output that may be2n-bits wide. Each multiplier circuit 136 may feed an output downstreamthat is the result of a multiplication operation. Each n bits by n bitsmultiplier circuit 136 may support two's complement signed or unsignedmultiplication. Dynamic signed/unsigned control inputs 156 may receiveinput signals that control the sign of the multipliers and themultiplicands for the multiplier operations of multiplier circuits 136.

MAC block 192 may include three sets of register circuits. MAC block 192may include input register circuits 134, pipeline register circuits, andoutput register circuit 154. If desired, additional pipeline registercircuits may be included inside multiplier circuits 136, insideadd-subtract-accumulate circuits 144, and/or inside add-subtractcircuits 140 to increase speed. Output register circuit 154 may includeapproximately the same number of registers that are in input registercircuits 134. The number of registers that are included in outputregister circuit 154 may be sufficient to register the output of MACblock 192 (e.g., register the output of MAC block 192 for all of themodes that are supported by MAC block 192). The number of outputregisters may be less than, equal to, or greater than the number of theinput registers depending on what implementation or architecture isbeing used for MAC block 192 or depending on the range of functionalitythat is being provided by MAC block 192.

For clarity and brevity, pipeline register circuits are not shown inFIG. 1 and are not shown in some of the other FIGS. described herein. Asmentioned above, input register circuits 134, pipeline register circuit,or output register circuit 154 may be included in MAC block 192 ifdesired. Independent sets of clock and clear signals 158 may be providedfor input register circuits 134, the pipeline register circuit, oroutput register circuit 154. Two sets of clock and clear signals 158 maybe provided for the input register circuits 134 and the pipelineregister circuits, and two sets may be provided for output registercircuit 154. Input register circuits 134 may include scan chains and mayinclude additional circuitry to be used with the scan chains to allowthe scan chains to be used as logic in some digital signal processingfunctions such as in providing FIR filters. Input register circuits 134may include 8n registers (e.g., 144 registers) for 8n data inputs and qregisters (e.g., 4 registers) for signed/unsigned control of multipliercircuits 136 and for add-subtract control of add-subtract-accumulatecircuits 144. Each register may have programmable inversion capabilityto provide logic inversion, when desired, or to invert unused bits ofregister inputs when an input for a multiplier has less than n bits.

Output register circuit 154 may have feedback paths 161 toadd-subtract-accumulate circuits 144 for accumulation operations. Anyone of the three sets of registers, input register circuit 134, thepipeline register circuit, and output register circuit 154 may bebypassed using programmable logic connectors (“PLCs”) in those circuitsthat may be controlled by random access memory control. The pipelineregister circuit may include approximately the same number of registersas input register circuits 134.

Interface circuitry 133 shown to the left of MAC block 192 may feed theinputs of MAC block 192, which may be the inputs of input registercircuits 134. Input register circuits 134 may include eight inputregisters that each have n bit inputs and that feed the inputs of thefour n bit by n bits multiplier circuits 136.

Add-subtract-accumulate circuits 144 may have connections for receivinginputs from multiplier circuits 136 and from return paths 161. Ifdesired, add-subtract-accumulate circuits 144 may be configured to passthe outputs from multiplier circuits 136 to adder circuit 140. Theoutputs of multiplier circuits 136 may be routed to output selectioncircuit 152 or output register circuit 154 without being routed throughadd-subtract-accumulate circuits 144 and/or add-subtract circuit 140.For the purposes of clarity and brevity and not by way of limitation andwithout loss of generality, add-subtract circuit 140 is described hereinprimarily in the context of an adder circuit. Add-subtract-accumulatecircuits 144 may each be configured to perform a two's complementaddition of two 2n bit inputs to produce a 2n+1 bit output.Add-subtract-accumulate circuits 144 may each be configured to perform atwo's complement subtraction of two 2n bit inputs to produce a 2n+1 bitoutput. Add-subtract-accumulate circuits 144 may each be configured toperform an accumulation of one 2n bit input with an n+y bit output.Dynamic add/subtract control inputs 162 and 164 may be inputs toadd-subtract-circuits 144 that are used to switch between addition andsubtraction operations and to handle complex multiplications. Dynamicadd/subtract inputs 162 and 164 may be needed for complexmultiplications, which involves multiplications involving complexnumbers. Complex multiplication of two complex numbers may sometimesinvolve both an addition operation and a subtraction operation.

The outputs of add-subtract-accumulate circuits 144 may be routed tooutput selection circuit 152 or output register 154 without being routedthrough adder circuit 140. If desired, adder circuit 140 may beconfigured to pass inputs from add-subtract-accumulate circuits 144(e.g., n+1 bit output of two's complement addition, n+y bit output ofaccumulation, etc.). Adder circuit 140 may have an output that is theresultant of the addition of the outputs from add-subtract-accumulatecircuits 144. Output selection circuit 152 may have inputs that are fromadder circuit 140. Output selection circuit 152 may select which ones ofthe inputs of output selection circuits 152 are passed to outputregister circuit 154. Output register circuit 154 may feed the inputs ofinterface circuitry 133 shown to the right of MAC block 192. The percentof local interconnect resources that is allocated for connecting thecircuits in MAC block 192 increases as the complexity and the variationsin digital signal processing functionality increases from left to rightin MAC block 192.

With reference to FIG. 1, the “top half” of MAC block 192 may include,among other components, the two multipliers 136 andadd-subtract-accumulate circuit 144 shown at the top of MAC block 192.The “bottom half” of MAC block 192 may include, among other components,the two multipliers 136 and add-subtract-accumulate circuit showing atthe bottom of MAC block 192.

MAC block 192 may be configured to have an n/2 bits by n/2 bitsmultiplier based organization. For example, with reference now to FIG.2, MAC block 192 may include multiplier circuits 136 that are configuredto include eight n/2 bits by n/2 bits multipliers. The eight n/2 bits byn/2 bits multipliers may be configured from the four n bits by n bitsmultipliers of multiplier circuits 136 of FIG. 1.

If desired, MAC block 192 may be implemented to be able to be configuredto have a p bits by p bits multiplier based organization and to have oneor more p/m bits by p/m bits multiplier based organizations where p, m,and p/m are integers. As mentioned above, this architecture is at leastpartially based on the limitations of the local interconnect resources.The different organizations may be selectable and MAC block 192 may becapable of being configured into some or all of the p/m bits by p/m bitsmultiplier based organizations.

MAC block 192 may include add-subtract-accumulate circuits 144configured to provide four add or subtract units. Each add or subtractunit may perform an addition-based operation on two n bit inputs andhave an n+1 bit output. If desired, add-subtract-accumulate circuits 144may be configured to pass the outputs of the n/2 bits by n/2 bitsmultiplier operation. The outputs of multiplier circuits 136 may berouted to output selection circuit 152 or output register circuit 154without being routed through add-subtract-accumulate circuits 144 oradder circuit 140. Add-subtract-accumulate circuits 144 may produce theresultant of the addition (or subtraction) of particular output pairs ofthe n/2 bits by n/2 bits multiplier operation.

MAC block 192 may include adder circuit 140 configured to provide twoadders. If desired, adder circuit 140 may pass the inputs that are fedto adder circuit 140 from add-subtract-accumulate circuits 144. Theoutputs of add-subtract-accumulate circuits 144 may be routed to outputselection circuit 152 or output register circuits 154 without beingrouted through adder circuit 140. Adder circuit 140 may produce twooutputs that are the resultants of the addition of particular pairs ofoutputs from add-subtract-accumulate circuits 144.

The local interconnect resources of MAC block 192 may be configurable toimplement the n/2 bits by n/2 bits multiplier based organization withthe same input/output interface circuitry 133 and supporting circuitry(e.g., multiplier circuits 136, adder circuit 140, etc.) as the n bitsby n bits multiplier based organization. The local interconnectresources of MAC block 192 may be configured to include some butterflycross connection patterns for forming appropriate interconnections inthe n/2 bits by n/2 bits multiplier based organization.

The butterfly cross connection patterns are implemented for selectinterconnections between input register circuits 134 and multipliercircuits 136. The butterfly cross connection patterns may be used tohave the n/2 higher order bits of pairs of n bit inputs multipliedtogether and to have the n/2 lower order bits of pairs of n bit inputsmultiplied together. The butterfly cross connection patterns areimplemented for select interconnections between multiplier circuits 136and add-subtract-accumulate circuits 144. As mentioned above,add-subtract-accumulate circuits 144 may be configured to include fouradd (or subtract) units. Each add (or subtract) unit may have two n bitinputs from multiplier circuits 136. The butterfly cross connectionpatterns may be used to have the two inputs of each add (or subtract)unit be either the resultant of the multiplication of the higher orderbits by the multipliers of multiplier circuits 136 or the resultant ofthe multiplication of the lower order bits by the multipliers ofmultiplier circuits 136. The butterfly cross connection patterns mayalso be used in the interconnect between add-subtract-accumulatecircuits 144 and adder circuit 140. Adder circuit 140 may be split intotwo adders (e.g., two independent adders). The butterfly crossconnection pattern may be used to feed the resultant of operations onhigher order bits to a top half of adder circuit 140 and to feed theresultant of operations on lower order bits to a bottom half of addercircuit 140. In the n/2 bits by n/2 bits multiplier based organization,accumulator functionality may not be available. Accumulatorfunctionality may not be available because the resources of MAC block192 may be substantially consumed in allowing for the implementation ofthe n/2 bits by n/2 bits multiplier based organization.

The butterfly cross connection patterns are exemplary of techniques fordecomposing a single multiplier circuit into multiple smaller multipliercircuits, exemplary of techniques for managing data so that the outputsof the multiple smaller multiplier circuits are appropriately addedtogether (e.g., adding lower order bits to lower order bits), orexemplary of techniques for managing data to compensate for limitationsin the resources of a MAC block. Such cross connect patterns may be usedto handle connections because of the way that circuitry for a MAC blockwas laid down or because of the arrangement that was selected for thecircuitry. The butterfly cross connection patterns are provided as anillustrative example. Other techniques may also be used. For example,the n bits by n bits multipliers may be decomposed in a different waythat eliminates the need for the butterfly cross connection patterns ordecomposed in a way that may require different types of cross connectpatterns. Accordingly, other cross connection or connection patterns maybe used to implement MAC block 192.

The flexibility and configurability of MAC block 192 may support theconfiguration of a set of modes of operation. If desired, MAC block 192of FIG. 1 and MAC block 192 of FIG. 2 may each be a separate embodimentof a MAC block with each having its own set of modes of operation. Insome embodiments, MAC block 192 may be configurable between having an nbits by n bits multiplier based organization or an n/2 bits by n/2 bitsmultiplier based organization and having modes of operation that areassociated with each. The modes of MAC block 192 may be configured withmemory bits to make the modes available to users.

The present invention is primarily described herein in terms of a MACblock having four 18 bit by 18 bit multipliers with twoadd-subtract-accumulate circuits and one second stage adder arranged asillustrated in FIGS. 1 and 2. It will be understood that this is merelyan illustrative arrangement and that the present invention may bepracticed with any other suitable MAC block have any suitable types ofcomponents arranged in any suitable arrangement.

A MAC block can be selected to operate in any suitable mode ofoperation. For example, for a MAC block having four 18 bit by 18 bitmultipliers, where each multiplier can generate a 36 bit output that isthe product of two 18 bit multiplicand inputs or two products(concatenated into a 36 bit product) of two pairs of 9 bit multiplicandinputs (concatenated into one pair of 18 bit inputs), suitable modes ofoperation include, for example, an 18 bit by 18 bit multiplier, a 52 bitaccumulator (e.g., multiply-and-accumulate), a sum of two 18 bit by 18bit multipliers, a sum of four 18 bit by 18 bit multipliers, a 9 bit by9 bit multiplier, a sum of two 9 bit by 9 bit multipliers, a sum of four9 bit by 9 bit multipliers, a 36 bit by 36 bit multiplier, or othersuitable modes. It will be understood that these are merely illustrativemodes that may be supported by a MAC block in accordance with thepresent invention. Other suitable modes may by supported. Those modeslisted above will be referred to herein as modes 1-8, respectively. Suchsupport of modes may be determined based on any suitable factors,including, for example, application needs, size of availablemultipliers, number of multipliers, or other suitable factors. Forexample, it is clear that if a MAC block included eight 9 bit by 9 bitmultipliers, different modes may be used (e.g., sum of eight 9 bit by 9bit multipliers).

One common DSP number representation is the 1.15 format. The 1.15 formatis a fixed-point number representation in which 16 bits are used torepresent values from −1 to (1−the least significant bit (“LSB”)). Themost significant bit (“MSB”) represents the sign bit and the rest of thebits represent the fractional component. A MAC block implemented inaccordance with the present invention supports rounding and saturationof 1.15 format numbers within any or all of its respective multipliersas well as within any or all of its respective add-subtract-accumulatecircuits (sometimes referred to herein as an “accumulator”).

It will be understood that the rounding and saturation features of thepresent invention may be implemented in multipliers and inadd-subtract-accumulate circuits when those components are used toimplement any suitable mode of operation. For example, rounding andsaturation may be provided in add-subtract-accumulate circuits in anaccumulate mode, or in any other suitable mode, such as a sum of two 18bit by 18 bit multiplier mode. If desired, rounding and saturation mayalso be provided in other arithmetic circuitry, such as in second stageadder 140 (FIGS. 1 and 2).

Rounding and saturation in multipliers and in add-subtract-accumulatecircuits (i.e., according to the present invention) may be supportedamong any or all modes of operation of a MAC block. For example, in onesuitable arrangement, rounding and saturation in the multipliers may besupported in modes 1 to 4; rounding in the add-subtract-accumulatecircuit may be supported in modes 2 to 4; saturation in theadd-subtract-accumulate circuit may be supported in mode 2. Theserestriction are illustrative restrictions that may result as aconsequence from a particular implementation of a MAC block and of thepresent invention. It will be understood that any suitableimplementations may be used and that as a result any suitablerestrictions may ensue. Restrictions may also be made by, for example, auser design or by the manufacturer of the programmable logic device forany suitable reasons. It will also be understood that saturation may beprovided for a particular multiplier or add-subtract-accumulate circuitbut not rounding. Rounding may be provided for a particular multiplieror add-subtract-accumulate circuit but not saturation. Any such suitabledesign may be implemented.

It will be understood that multiplication of two 1.15 format numbersproduces a 2.30 product. Because the two 1.15 format numbers are in therange of −1 to 1, only one sign bit need preferably be used. Themultiplication product is therefore preferably left shifted by 1 bitresulting in a 1.31 number in which the LSB of the shifted product iszero. In one suitable approach, there need not be an actual left shiftin the hardware implementation of the left shift. Rather, instead oftaking the 1.31 product on the 32 MSB of the multiplier output bus, the1.31 product is located on bits [34:3] of the output bus. This is merelyan illustrative optimization that need not be implemented (i.e., anactual left shift may be implemented). Also, any suitable bits of theoutput bus may be used besides [34:3].

In many DSP applications, a rounded 1.15 format 16 bit product isdesired. Thus, users are interested in the top 16 bits (i.e., the 16MSB) of the shifted product. One way in which this desired result may beobtained is by adding the value 0×00008000 to the shifted product sothat the 16 LSB of the shifted product may then be set to zero,resulting in an unbiased rounded 1.15 format result.

In one suitable approach, instead of zeroing the 16 LSB of themultiplication product, a truncation may be performed whereby the 16 LSBare truncated to generate a rounded and truncated 1.15 result. Aseparate truncate signal may be used whereby round signal 302 and thetruncate signal may be ORed together in order to control the operationof zeroing circuitry 328. Alternatively, truncation may be the onlymethod of rounding provided. For purposes of clarity and brevity, thepresent invention is primarily described herein in terms of zeroing the16 LSB. It will be understood that truncating may be implemented inplace of or in addition to the zeroing approach.

A special case in 1.15 format multiplication occurs when multiplying0×8000 (i.e., −1) by 0×8000 (i.e., −1). It will be understood that theresult (i.e., 1) cannot be represented in the 1.31 format. Instead, the1.31 format multiplier product is preferably set to 0×7FFFFFFF (i.e.,1−LSB) if saturation is enabled. If not enabled, then circuitryresponsible for rounding and for saturating is preferably bypassed.

Enabling saturation generates an overflow bit. The overflow bit may belocated at any suitable bit location in the product (in the case ofsaturation in a multiplier) or sum (in the case of saturation in anadd-subtract-accumulate circuit). For example, the overflow bit may belocated on the LSB of the product. The overflow bit may be located at adifferent bit location depending on any suitable factor, such as currentmode of operation.

With regard to rounding and saturation in the add-subtract-accumulatecircuit, if the multiplication product is located in bits [34:3] of themultiplier output bus, a 52 bit accumulator would have 49 bits ofprecision. This provides up to 131072 (i.e., 2E17) accumulation cyclesas opposed to 1048576 (i.e., 2E20) provided by product located in bits[31:0]. It will be appreciated, however, that outputs may be located onany suitable bits even if fewer accumulation cycles are provided. Suchdesign decisions may be based on any suitable design and applicationcriteria.

When rounding is activated, 0×000008000 is added to the accumulator andthe 16 LSB of the accumulator result are set to zero.

When saturation is activated, the accumulator value is set to either themaximum (0×000007FFFFFFF) in case of overflow or the minimum(0×1FFFF80000000) in case of underflow. The accumulator's overflow bitmay be located on any suitable bit of the output bus (e.g., LSB, bit[2], etc.). The 49 accumulator bits may be located on the accumulatoroutput bus's bits [51:3].

The present invention will now be described with reference to FIGS. 3-6.FIG. 3 shows an illustrative multiplier 300 implemented in accordancewith one embodiment of the present invention. Inputs 306 and 308 aremultiplied using multiplication circuitry 310. Output 312 ofmultiplication circuitry is in a 2.30 format. Output 312 is left-shiftedusing shifting circuitry 314 to produce output 316 having a 1.31 format(i.e., because only a single sign bit is needed).

If round signal 302 indicates that rounding is to be activated, then1.31 format signal 316 is added with 0×00008000 (i.e., via input 318)using adder 320. This is done in order to add 1 to the 16^(th) MSB ofthe fractional part of the 1.31 format product (i.e., signal 316) whenthe product is represented over bits [31:0]. Output 322 of adder 320 isin a 1.31 format. If saturation signal 304 indicates that an overflowcondition is to be checked and dealt with, then saturation circuitry 324checks whether inputs 306 and 308 are 0×8000 (−1) and 0×8000 (−1). Ifso, saturation takes place in which output 326 is set to 0×7FFFFFFF(i.e., 1−LSB). This avoids having to make the impossible representationof the value 1 using 1.15 (or 1.31) format.

Output 326 (i.e., in 1.31 format) of saturation circuitry 324 is theninput into zeroing circuitry 328. If round signal 302 indicates thatrounding is to take place, then at circuitry 328, rounding takes placeby zeroing the 16 LSB of the 1.31 format value represented by signal326. Output 330 of zeroing circuitry then provides a 1.31 signal inwhich the LSB 16 bits are zero, effectively representing a 1.15 format.

FIG. 4 is an illustrative block diagram of a portion of adder 320 inaccordance with the present invention when rounding is not activated.FIG. 4 shows 10 partial product terms 400 (i.e., because of thepreferable ability to split the 18 bit by 18 bit multiplier of whichthis circuitry is part into two 9 bit by 9 bit multipliers). Partialproduct terms 400 are preferably input into full adders 402 and 406 andhalf adders 404 and 408.

Input 410 is an effectively non-existent round input. That is, becauseFIG. 4 illustrates circuitry of adder 320 when there is no rounding,adder 404 does not receive any round-based input, making adder 404 aneffective half adder.

When rounding is activated, then, as illustrated in FIG. 5, adder 404becomes an effective full adder because input 512 represents signal 318(i.e., 0×00008000).

FIG. 6 shows a block diagram of an illustrative accumulator havingrounding and saturation capabilities in accordance with the presentinvention. Adder/subtracter 606 takes as input signals 602 and 604corresponding to a product from a multiplier and an accumulator valuederived from a previous accumulation cycle, respectively. Signal 602 ispreferably a 1.31 format representation. Signal 604 is preferably a18.31 format representation (i.e., because the multiplication product ispreferably located in bits [34:3] of the multiplier output bus, theaccumulator has 49 bits of precision.

When rounding is activated (i.e., as indicated by round signal 622),0×00008000 is added to the accumulator (i.e., because most applicationsare interested in the 16 MSB). This is shown in FIG. 6 by adder 612adding together signal 608, representing the unrounded, unsaturatedaccumulator result with signal 610, representing 0×00008000. The 16 LSBare then set to zero using zeroing circuitry 620.

Unlike saturation in multipliers where because of the nature of the 1.15format, the only result that produces difficulty is −1, withaccumulation, it will be appreciated that overflow and underflowconditions may exist when the accumulation result is greater than orequal to 1 or less than −1.

If saturation is enabled (i.e., based on saturation signal 624), thensaturation circuitry 616 tests for overflow and underflow conditions(i.e., where the accumulator result is greater than or equal to 1, orless than −1, respectively). If an overflow or underflow is encountered,then saturation circuitry 616 sets output 618 to the maximum (i.e.,0×000007FFFFFFF) or minimum (i.e., 0×1FFFF80000000), respectively. Itwill be understood that the overflow and underflow conditions tested bysaturation signal 624 is different from the overflow output bit in theaccumulator. If no underflow or overflow is found, then signal 618 iscarrying substantially the same value as signal 614.

If rounding is enabled, then zeroing circuitry 620 zeros the 16 LSB ofthe value represented by input signal 618 to produce a 18.31 formatoutput 622 in which the 16 LSB are zero. Thus, the output iseffectively, in a 18.15 format.

As was discussed above, user are many times interested in the 16 MSB ofan output. This is with respect to the fractional component of a x.15format representation of a value. The variable “x” may be any suitableinteger that represents the number of bits representing a whole numbervalue, which, when added to the fractional component, produces the valuerepresented by x.15.

It will be understood that although the present invention is describedherein predominately in terms of a 1.15 format inputs and outputs, 1.31format intermediate values, 18.31 format values, 18.15 format values,etc., the present invention may be applied using any other suitablerepresentation of values. For example, the predetermined values (e.g.,corresponding to signals 610 and 318) used may be adjusted based onwhich format is being used; zeroing circuitries 622 and 328 may beadjusted to produced any suitable format output; and any other suitablemodifications may be made to accommodate any desired representation ofvalues in accordance with the present invention.

It will be appreciated that each multiplier of a MAC block and that eachadd-subtract-accumulate circuit of a MAC block may be implemented withthe rounding and saturation capabilities described above. For example,separate signals 622 and 624 may be used for each distinct multiplierand add-subtract-accumulate circuit. Alternatively, only some of thesecomponents may be implemented having rounding and saturationcapabilities.

FIG. 7 is a simplified block diagram of a programmable logic resource700 having one or more MAC blocks 702 configured in accordance with thepresent invention. Programmable logic resource 700 may have any suitableinterconnection circuitry, memory circuitry, and programmable logiccircuitry to allow programmable logic resource 700 to implement userdesigns and to make use of MAC blocks 702 in implementing the userdesigns.

FIG. 8 illustrates a programmable logic resource 700 (FIG. 7) of thisinvention (i.e., having at least one multiplier configured with the modesplitting features of the present invention) in a data processing system800 in accordance with one embodiment of the present invention. Dataprocessing system 800 may include one or more of the followingcomponents: a processor 802; memory 804; I/O circuitry 806; andperipheral devices 808. These components are coupled together by asystem bus 810 and are populated on a circuit board 812 which iscontained in an end-user system 814.

System 800 may be used in a wide variety of applications, such ascomputer networking, data networking, instrumentation, video processing,DSP, or any other application where the advantage of using programmableor reprogrammable logic is desirable. Programmable logic resource 700may be used to perform a variety of different logic functions. Forexample, PLD 800 may be configured as a processor or controller thatworks in cooperation with processor 802. Programmable logic resource 700may also be used as an arbiter for arbitrating access to a sharedresource in system 800. In yet another example, programmable logicresource 700 may be configured as an interface between processor 802 andone of the other components in system 800.

Thus, saturation and rounding in a MAC block is provided. One skilled inthe art will appreciate that the present invention can be practiced byother than the described embodiments, which are presented for purposesof illustration and not of limitation, and the present invention islimited only by the claims which follow.

1. A programmable logic resource comprising digital signal processingcircuitry, the digital signal processing circuitry comprising: at leastone circuit element configured with the capability to perform roundingand saturation on a respective output of the at least one circuitelement.
 2. The programmable logic resource of claim 1 wherein the atleast one circuit element comprises at least one multiplier.
 3. Theprogrammable logic resource of claim 1 wherein the at least one circuitelement comprises at least one add-subtract-accumulate circuit.
 4. Theprogrammable logic resource of claim 1 wherein the at least one circuitelement comprises at least one multiplier and at least oneadd-subtract-accumulate circuit.
 5. The programmable logic resource ofclaim 1 wherein the digital signal processing circuitry is amultiply-accumulate block.
 6. The programmable logic resource of claim 1wherein the at least one multiplier is configured with the capability toperform rounding and saturation based on a 1.15 format.
 7. A printedcircuit board on which is mounted a programmable logic resource asdefined in claim
 1. 8. The printed circuit board of claim 7 furthercomprising a memory mounted on the printed circuit board and coupled tothe programmable logic resource.
 9. The printed circuit board of claim 8further comprising processing circuitry mounted on the printed circuitboard and coupled to the memory circuitry.